

#### **Microkernel Construction** I.2 – Threads, System Calls, Thread Switching

Lecture Summer Term 2017 Wednesday 15:45-17:15 R131, 50.34 (INFO)

#### Jens Kehne, Marius Hillenbrand Operating Systems Group, Department of Computer Science









Thread

Address space

# What is a thread?How to implement it?

What conclusions can we draw from our analysis with respect to μ-kernel construction?

#### **Thread Properties**





#### **Construction Conclusion**



Thread state must be saved/restored on thread switch
 We need a Thread Control Block (TCB) per thread

TCBs must be kernel objects

TCBs implement threads

- We often need to find
  - The TCB of any thread using its global ID
  - The TCB of the currently executing thread (per processor)

At least partially. We have found some good reasons to implement parts of the TCB in user memory ( $\rightarrow$  IPC).

















# user mode A kernel

Jens Kehne, Marius Hillenbrand – Microkernel Construction, SS 2017











- Thread A is running in user mode
- Thread A experiences its end of time slice or is preempted by a (device) interrupt
- We enter kernel mode
- The microkernel saves the status of thread A on A's T C B
- The microkernel loads the status of thread B from B's T C B
- We leave kernel mode
- Thread B is running in user mode









IP



How to save usermode state when switching to kernel? How do we know which kernel thread



























#### Thread Switch with single kernel stack





#### **Construction Conclusion**



From the view of the designer there are two alternatives:

## **Single Kernel Stack**

- Only one stack is used in kernel mode all the time
- seL4, OKL4

### Per-Thread Kernel Stack

- Each thread has its own stack in kernel mode
- Pistachio, Fiasco.OC

#### Single Kernel Stack Per processor, event model



#### Either continuations

Complex to program

### Or stateless kernel

- No kernel threads, kernel not interruptible, difficult to program
- Structurally inefficient system calls
- + Kernel can be exchanged on-the-fly
  - E.g. the fluke kernel from Utah
- Low cache and TLB footprint
  - The same stack is always used!
- Stack can be larger
  - Easier to use recursion in the kernel!
- Easier to prove

#### Multiple Kernel Stacks Per thread, activity model



- Kernel can always use threads, no special methods required for keeping state while interrupted/blocked
- No conceptual difference between kernel mode and user mode

#### **Conclusion:**

We have to look for a solution that minimizes the kernel stack size

Larger cache and TLB footprint
 Limited kernel stack size

#### **Conclusion:**

We have to avoid recursion in the kernel (→ Mapping)



## **KERNEL ENTRY AND EXIT ON** IA-32





points to the currently running th re a d 's k e rn e l sta c k





- Trap/fault occurs (int n / exception / interrupt)
  - Push user SS:ESP onto kernel stack, load kernel SS:ESP





- Trap/fault occurs (int n / exception / interrupt)
  - Push user SS:ESP onto kernel stack, load kernel SS:ESP
  - Push user EFLAGS, reset flags (I := 0, CPL := 0)





- Trap/fault occurs (int n / exception / interrupt)
  - Push user SS:ESP onto kernel stack, load kernel SS:ESP
  - Push user EFLAGS, reset flags (I := 0, CPL := 0)
  - Push user CS:EIP, load kernel entry CS:EIP





- Trap/fault occurs (int n / exception / interrupt)
  - Push user SS:ESP onto kernel stack, load kernel SS:ESP
  - Push user EFLAGS, reset flags (I := 0, CPL := 0)
  - Push user CS:EIP, load kernel entry CS:EIP
- Push X: error code (hw, at exception) or kernel-call type

hardware - programmed, single "i nstruc ti on"





- Trap/fault occurs (int n / exception / interrupt)
  - Push user SS:ESP onto kernel stack, load kernel SS:ESP
  - Push user EFLAGS, reset flags (I := 0, CPL := 0)
  - Push user CS:EIP, load kernel entry CS:EIP
- Push X: error code (hw, at exception) or kernel-call type
- Push registers (optional)

hardware programmed, single "i nstruc ti on"









# **THREAD SWITCH ON IA-32**

#### Locating the TCB









int \$0x0







**34** 03.05.2017





**int \$0x0**, push registers of blue thread





int \$0x0, push registers of blue thread

Switch kernel stacks (store and load ESP)



#### Switching Threads (IA-32, per-thread stack)



- Switch kernel stacks (store and load ESP)
- Set ESP0 to new kernel stack



#### Switching Threads (IA-32, per-thread stack)



- Switch kernel stacks (store and load ESP)
- Set ESP0 to new kernel stack
- Pop red registers



#### Switching Threads (IA-32, per-thread stack)



- Switch kernel stacks (store and load ESP)
- Set ESP0 to new kernel stack
- Pop red registers, return to red user thread (iret)

### Thread Switch (IA-32, per-thread stack)









int \$0x0









**int \$0x0**, push registers of blue thread

**43** 03.05.2017

Operating Systems Group Department of Computer Science





# **int \$0x0**, push registers of blue thread

Find blue continuation





- Find blue continuation
- Move registers of blue thread to continuation





- Move registers of blue thread to continuation
- Restore red IP/SP/Flags from continuation





- Move registers of blue thread to continuation
- Restore red IP/SP/Flags from continuation
- Restore red registers





- Move registers of blue thread to continuation
- Restore red IP/SP/Flags from continuation
- Restore red registers, return to red user thread (iret)

#### Kernel preemption with single stack





Where to save kernel state (stack + regs)?

- Kernel stack? Stack size unbounded with nested interrupts
- Continuation? Might as well have per-thread stacks
- User-mode stack? What could possibly go wrong...?

#### What about other registers?



- So far, we have only considered general purpose registers
  What about FPU, SSE?
- Extremely expensive
  - IA-3 2sfullSSE2 state is 5 1 2 Bytes
  - IA-64sfloatingpointstateis~1.5 KiB
- Saving/restoring must be extremely efficient
  Need a place to store FP state
  - WAY too large for TCB/Kernel stack

#### Hardware to the rescue



- x86 has HW support for saving/restoring FP state
- **•** xsave *addr*  $\rightarrow$  Store FP state to addr
- **•** xrstor *addr*  $\rightarrow$  Restore FP state from addr
  - Addr is called the Xsave area
- EDX:EAX contain contain info what to save (bitmap)

| Reserved |  | AVX-<br>512 | МРХ | A<br>V<br>X | S<br>S<br>E | x<br>8<br>7 |
|----------|--|-------------|-----|-------------|-------------|-------------|
|----------|--|-------------|-----|-------------|-------------|-------------|

- Bitmap stored in xsave area
- xsave area can be anywhere
  - Pointer kept in TCB

## Xsave optimizations



- xsave/xrstor provide highly efficient save/restore
  But: Still a lot of data to copy
- Init optimization:
  - If FU is in initial state (=unused), do not save
- Modified optimization:
  - If register was not modified since last xrstor, do not save



## Summary

## TCBs

- Implement threads
- Must store thread state while preempted
- Kernel stacks
  - Either per thread (large TLB footprint, no recursion)
  - Or per core (need continuations, no kernel preemption)

## Thread switch

- Switch kernel stack
- Or switch state on kernel stack